On the locality of action domination in sequential decision making

نویسندگان

  • Emmanuel Rachelson
  • Michail G. Lagoudakis
چکیده

In the field of sequential decision making and reinforcement learning, it has been observed that good policies for most problems exhibit a significant amount of structure. In practice, this implies that when a learning agent discovers an action is better than any other in a given state, this action actually happens to also dominate in a certain neighbourhood around that state. This paper presents new results proving that this notion of locality in action domination can be linked to the smoothness of the environment’s underlying stochastic model. Namely, we link the Lipschitz continuity of a Markov Decision Process to the Lispchitz continuity of its policies’ value functions and introduce the key concept of influence radius to describe the neighbourhood of states where the dominating action is guaranteed to be constant. These ideas are directly exploited into the proposed Localized Policy Iteration (LPI) algorithm, which is an active learning version of Rollout-based Policy Iteration. Preliminary results on the Inverted Pendulum domain demonstrate the viability and the potential of the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convergence in a sequential two stages decision making process

We analyze a sequential decision making process, in which at each stepthe decision is made in two stages. In the rst stage a partially optimalaction is chosen, which allows the decision maker to learn how to improveit under the new environment. We show how inertia (cost of changing)may lead the process to converge to a routine where no further changesare made. We illustrate our scheme with some...

متن کامل

Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering

The Basel II Accord pointed out benefits of credit risk management through internal models to estimate Probability of Default (PD). Banks use default predictions to estimate the loan applicants’ PD. However, in practice, PD is not useful and banks applied credit scorecards for their decision making process. Also the competitive pressures in lending industry forced banks to use profit scorecards...

متن کامل

Exclusionary Decision Making in Tehran Metropolitan Region- Complexity, Self organization and Power of Action

Viewing urban areas as webs of complex, interwoven networks, this article aims to analyze the decision-making process and its outcomes in Tehran metropolitan region. To do so, first the theoretical basis of complexity in urban life and its implications for planning have been reviewed. Using the main notion of power of action i.e. agency, and through creating the network of actors and their rela...

متن کامل

Investigating the Effect of Rewards on Individual Players' Efforts: A Behavioral Approach

The main goal of the study is to examine the effect of rewards on the behavior of players in a team activity. In this framework, by performing 12 sequential and simultaneous games in a laboratory environment examine the rewarding effect on players' behavior. Students from Yazd universities surveyed and the sample of 182 students is in the form of two groups, which collected in total for 2184 ma...

متن کامل

Optimizing Red Blood Cells Consumption Using Markov Decision Process

In healthcare systems, one of the important actions is related to perishable products such as red blood cells (RBCs) units that its consumption management in different periods can contribute greatly to the optimality of the system. In this paper, main goal is to enhance the ability of medical community to organize the RBCs units’ consumption in way to deliver the unit order timely with a focus ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010